29 research outputs found
HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC
This paper presents HALO 1.0, an open-ended extensible multi-agent software
framework that implements a set of proposed hardware-agnostic accelerator
orchestration (HALO) principles. HALO implements a novel compute-centric
message passing interface (C^2MPI) specification for enabling the
performance-portable execution of a hardware-agnostic host application across
heterogeneous accelerators. The experiment results of evaluating eight widely
used HPC subroutines based on Intel Xeon E5-2620 CPUs, Intel Arria 10 GX FPGAs,
and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows for a unified
control flow for host programs to run across all the computing devices with a
consistently top performance portability score, which is up to five orders of
magnitude higher than the OpenCL-based solution.Comment: 21 page
FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing
In this paper, we present FLASH 1.0, a C++-based software framework for rapid
parallel deployment and enhancing host code portability in heterogeneous
computing. FLASH takes a novel approach in describing kernels and dynamically
dispatching them in a hardware-agnostic manner. FLASH features truly
hardware-agnostic frontend interfaces, which not only unify the compile-time
control flow but also enforces a portability-optimized code organization that
imposes a demarcation between computational (performance-critical) and
functional (non-performance-critical) codes as well as the separation of
hardware-specific and hardware-agnostic codes in the host application. We use
static code analysis to measure the hardware independence ratio of popular HPC
applications and show that up to 99.72% code portability can be achieved with
FLASH. Similarly, we measure the complexity of state-of-the-art portable
programming models and show that a code reduction of up to 2.2x can be achieved
for two common HPC kernels while maintaining 100% code portability with a
normalized framework overhead between 1% - 13% of the total kernel runtime. The
codes are available at https://github.com/PSCLab-ASU/FLASH.Comment: 12 page
Enhanced Low-resolution LiDAR-Camera Calibration Via Depth Interpolation and Supervised Contrastive Learning
Motivated by the increasing application of low-resolution LiDAR recently, we
target the problem of low-resolution LiDAR-camera calibration in this work. The
main challenges are two-fold: sparsity and noise in point clouds. To address
the problem, we propose to apply depth interpolation to increase the point
density and supervised contrastive learning to learn noise-resistant features.
The experiments on RELLIS-3D demonstrate that our approach achieves an average
mean absolute rotation/translation errors of 0.15cm/0.33\textdegree on
32-channel LiDAR point cloud data, which significantly outperforms all
reference methods